Confounds and Consequences in Geotagged Twitter Data

نویسندگان

  • Umashanthi Pavalanathan
  • Jacob Eisenstein
چکیده

Twitter is often used in quantitative studies that identify geographically-preferred topics, writing styles, and entities. These studies rely on either GPS coordinates attached to individual messages, or on the user-supplied location field in each profile. In this paper, we compare these data acquisition techniques and quantify the biases that they introduce; we also measure their effects on linguistic analysis and textbased geolocation. GPS-tagging and selfreported locations yield measurably different corpora, and these linguistic differences are partially attributable to differences in dataset composition by age and gender. Using a latent variable model to induce age and gender, we show how these demographic variables interact with geography to affect language use. We also show that the accuracy of text-based geolocation varies with population demographics, giving the best results for men above the age of 40.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Understanding Human Mobility from Twitter

Understanding human mobility is crucial for a broad range of applications from disease prediction to communication networks. Most efforts on studying human mobility have so far used private and low resolution data, such as call data records. Here, we propose Twitter as a proxy for human mobility, as it relies on publicly available data and provides high resolution positioning when users opt to ...

متن کامل

Geotagged US Tweets as Predictors of County-Level Health Outcomes, 2015-2016.

OBJECTIVES To leverage geotagged Twitter data to create national indicators of the social environment, with small-area indicators of prevalent sentiment and social modeling of health behaviors, and to test associations with county-level health outcomes, while controlling for demographic characteristics. METHODS We used Twitter's streaming application programming interface to continuously coll...

متن کامل

Twitter Event Photo Detection Using both Geotagged Tweets and Non-geotagged Photo Tweets

In this paper, we propose a system to detect event photos using geotagged tweets and non-geotagged photo tweets. In our previous work, only “geotagged photo tweets” was used for event photo detection the ratio of which to the total tweets was very limited. In the proposed system, we use geotagged tweets without photos for event detection, and non-geotagged photo tweets for event photo detection...

متن کامل

Leveraging geotagged Twitter data to examine neighborhood happiness, diet, and physical activity.

OBJECTIVES Using publicly available, geotagged Twitter data, we created neighborhood indicators for happiness, food and physical activity for three large counties: Salt Lake, San Francisco and New York. METHODS We utilize 2.8 million tweets collected between February-August 2015 in our analysis. Geo-coordinates of where tweets were sent allow us to spatially join them to 2010 census tract loc...

متن کامل

Real-time analysis application for identifying bursty local areas related to emergency topics

Since social media started getting more attention from users on the Internet, social media has been one of the most important information source in the world. Especially, with the increasing popularity of social media, data posted on social media sites are rapidly becoming collective intelligence, which is a term used to refer to new media that is displacing traditional media. In this paper, we...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015